65 research outputs found

    Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β\beta-Mixing Processes

    Full text link
    Pac-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first - to the best of our knowledge - Pac-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new Pac-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as Auc) and classifiers trained on data distributed according to a stationary {\ss}-mixing process. In the way, we show how our approach seemlessly allows us to deal with U-processes. As a side note, we also provide a Pac-Bayes generalization bound for classifiers learned on data from stationary φ\varphi-mixing distributions.Comment: Long version of the AISTATS 09 paper: http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd

    Learning the optimal scale for GWAS through hierarchical SNP aggregation

    Full text link
    Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual's genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs. Results: We propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and multivariate approaches on both synthetic and real GWAS data, and we show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions

    Composite kernel learning

    Get PDF
    The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multi-channel data where groups correpond to channels. 1

    Régression semi-supervisée à sortie noyau pour la prédiction de liens

    Get PDF
    National audienceNous abordons le problème de la prédiction de liens comme une tâche d'apprentissage d'un noyau de sortie par régression à sortie noyau semi-supervisée. En se plaçant dans le cadre de la théorie des espaces de Hilbert à noyau autoreproduisant à valeurs opérateurs pour des fonctions à valeurs vectorielles, nous établissons un nouveau théorème de représentation dédié à la régression semi-supervisée pour un critère des moindres carrés pénalisé. Nous choisissons ensuite un noyau à valeur opérateur défini à partir d'un noyau d'entrée à valeurs scalaires puis nous construisons un espace de Hilbert avec ce noyau comme noyau autoreproduisant. Nous appliquons ensuite le théorème de représentation. La minimisation des moindres carrés pénalisés dans ce cadre conduit à une solution analytique comme dans le cas de la régression ridge qui est donc ici étendue. Nous étudions la pertinence de cette nouvelle approche semi-supervisée dans le cadre de la prédiction de lien transductive. Des jeux de données artificiels étayent notre étude puis deux applications réelles sont traitées en utilisant un très faible pourcentage de données étiquetées

    Protein-protein interaction network inference with semi-supervised Output Kernel Regression

    Get PDF
    National audienceIn this work, we address the problem of protein-protein interaction network inference as a semi-supervised output kernel learning problem. Using the kernel trick in the output space allows one to reduce the problem of learning from pairs to learning a single variable function with values in a Hilbert space. We turn to the Reproducing Kernel Hilbert Space theory devoted to vector- valued functions, which provides us with a general framework for output kernel regression. In this framework, we propose a novel method which allows to extend Output Kernel Regression to semi-supervised learning. We study the relevance of this approach on transductive link prediction using artificial data and a protein-protein interaction network of S. Cerevisiae using a very low percentage of labeled data

    Mutations in the Polycomb Group Gene polyhomeotic Lead to Epithelial Instability in both the Ovary and Wing Imaginal Disc in Drosophila

    Get PDF
    Most human cancers originate from epithelial tissues and cell polarity and adhesion defects can lead to metastasis. The Polycomb-Group of chromatin factors were first characterized in Drosophila as repressors of homeotic genes during development, while studies in mammals indicate a conserved role in body plan organization, as well as an implication in other processes such as stem cell maintenance, cell proliferation, and tumorigenesis. We have analyzed the function of the Drosophila Polycomb-Group gene polyhomeotic in epithelial cells of two different organs, the ovary and the wing imaginal disc.Clonal analysis of loss and gain of function of polyhomeotic resulted in segregation between mutant and wild-type cells in both the follicular and wing imaginal disc epithelia, without excessive cell proliferation. Both basal and apical expulsion of mutant cells was observed, the former characterized by specific reorganization of cell adhesion and polarity proteins, the latter by complete cytoplasmic diffusion of these proteins. Among several candidate target genes tested, only the homeotic gene Abdominal-B was a target of PH in both ovarian and wing disc cells. Although overexpression of Abdominal-B was sufficient to cause cell segregation in the wing disc, epistatic analysis indicated that the presence of Abdominal-B is not necessary for expulsion of polyhomeotic mutant epithelial cells suggesting that additional polyhomeotic targets are implicated in this phenomenon.Our results indicate that polyhomeotic mutations have a direct effect on epithelial integrity that can be uncoupled from overproliferation. We show that cells in an epithelium expressing different levels of polyhomeotic sort out indicating differential adhesive properties between the cell populations. Interestingly, we found distinct modalities between apical and basal expulsion of ph mutant cells and further studies of this phenomenon should allow parallels to be made with the modified adhesive and polarity properties of different types of epithelial tumors

    Pyramides de noyaux

    No full text
    National audienceL'apprentissage statistique vise à prédire, mais aussi analyser ou interpréter un phénomène. Nous proposons de guider le processus d'apprentissage en intégrant une connaissance relative à la façon dont sont organisées les similarités entre exemples. La connaissance est représentée par une "pyramide de noyaux", une structure arborescente qui permet d'organiser des groupes et sous-groupes distincts de similarités. Si nous pouvons faire l'hypothèse que peu de (groupes de) similarités sont pertinentes pour discriminer les observations, notre approche fait émerger les groupes et sous-groupes de similarités pertinentes. Nous proposons ici la première solution complète à ce problème, permettant l'apprentissage d'un séparateur à vaste marge (SVM) sur des pyramides de noyaux de hauteur arbitraire. Les pondérations des (groupes de) similarités sont apprises conjointement avec les paramètres du SVM, par optimisation d'un critère que nous montrons être une formulation variationnelle d'un problème régularisé par une norme mixte. Nous illustrons notre approche sur un problème de reconnaissance d'expressions faciales, où les caractéristiques des images sont décrites par une pyramide représentant l'organisation spatiale et l'échelle des filtres d'ondelettes appliqués sur des patchs d'images

    Pénalités hiérarchiques pour l'intégration de connaissances dans les modèles statistiques

    No full text
    Supervised learning aims at predicting, but also analyzing or interpreting an observed phenomenon. Hierarchical penalization is a generic framework for integrating prior information in the fitting of statistical models. This prior information represents the relations shared by the characteristics of a given studied problem. In this thesis, the characteristics are organized in a two-levels tree structure, which defines distinct groups. The assumption is that few (groups of) characteristics are involved to discriminate between observations. Thus, for a learning problem, the goal is to identify relevant groups of characteristics, and at the same time, the significant characteristics within these groups. An adaptive penalization formulation is used to extract the significant components of each level. We show that the solution of this problem is equivalent to minimize a problem regularized by a mixed norm. These two approaches have been used to study the convexity and sparseness properties of the method. The latter is derived in parametric and non parametric function spaces. Experiences on brain-computer interfaces problems support our approach.L'apprentissage statistique vise à prédire, mais aussi analyser ou interpréter un phénomène. Dans cette thèse, nous proposons de guider le processus d'apprentissage en intégrant une connaissance relative à la façon dont les caractéristiques d'un problème sont organisées. Cette connaissance est représentée par une structure arborescente à deux niveaux, ce qui permet de constituer des groupes distincts de caractéristiques. Nous faisons également l'hypothèse que peu de (groupes de) caractéristiques interviennent pour discriminer les observations. L'objectif est donc de faire émerger les groupes de caractéristiques pertinents, mais également les caractéristiques significatives associées à ces groupes. Pour cela, nous utilisons une formulation variationnelle de type pénalisation adaptative. Nous montrons que cette formulation conduit à minimiser un problème régularisé par une norme mixte. La mise en relation de ces deux approches offre deux points de vues pour étudier les propriétés de convexité et de parcimonie de cette méthode. Ces travaux ont été menés dans le cadre d'espaces de fonctions paramétriques et non paramétriques. L'intérêt de cette méthode est illustré sur des problèmes d'interfaces cerveaux-machines

    The Artist Edgar Degas and His Portrayal of Women

    No full text
    • …
    corecore